How to Use Learn-To-Rank Features

NOTE: Learn-To-Rank works only with SmartAnalytics v5.0 and later versions.

About Learn-To-Rank
Learn-To-Rank Page
Features Configuration
Learn-To-Rank Trainer
Training Results
Features

About Learn-To-Rank

The Learn-To-Rank component is implemented on top of SmartAnalytics data to support four main features:

Results Boosting
User segmentation boosting
Query suggestions
Content suggestions

Learn-to-Rank uses clustering algorithms (K-means, but extendable to any other algorithm) to build clusters of similar (related) queries and their associated “important” documents.

Learn-To-Rank groups queries and documents based on query similarity as well as user interaction with documents after a query is executed.
These groups are called "clusters".

In a cluster, there are:

Queries that are similar on the containing keywords
Documents that those queries led to
Other queries that are not necessarily similar in keywords, but led to the same documents as other queries

Learn-To-Rank Page

The Learn-To-Rank page can be found in the SmartHub Administration page (https://<hostname>:port/_admin), in the left panel.

The page is composed of two parts:

Features Configuration
Training Results (data)

Features Configuration

In the table below, you will find the Learn-To-Rank settings as they appear in the SmartHub interface. See the Training Results section below for more details and examples of the settings shown here.

These settings must be fine tuned, over time, to fit properly with your environment and data
Properly tuned settings reveal the most relevant documents within a cluster and search query

The user segmentation fields configured in the Learn-to-Rank settings must be present in the analytics metadata when running the Learn-to-Rank job. If they are not, the user segmentation fields in the clustered index will be empty, meaning the segmentation boost will not apply and learn-to-rank will default to boosting by hit count.

Setting	Default	Description
Learn to Rank Index Name	learntorank-cluster-storage	Learn To Rank index name to be used for cluster storage Note: In case that the name of the index is changed, after an index is already created, the old index will not be removed from Elastic.
Number of days to go back for data	30000	Total number of days for query analytics data freshness.
Number of actions threshold	20	The minimum number of executed queries in order to be used for training. This number must fit your environment. A production environment has thousands or even millions of documents. This number must reflect the data in your environment.
Max number of clusters	10	Total number of clusters that you want your data to be split into. This number must fit your environment. A production environment has thousands or even millions of documents. This number must reflect the data in your environment.
Number of documents to be boosted	2	How many documents are going to be boosted (in case your search query can be assigned to a cluster). For a given cluster (see the example screenshot in "Training Results," below), a value of 2 here would boost the top two documents in the list of documents shown. The boost value for each document is a constant boost (cb) value between 1 and 100, calculated based on how many actions the document to be boosted has relative to maximum number of actions from the same cluster. So if a given document has 1.5 more (or less) actions than any other document in the cluster, the document is boosted on a scale equivalent to that. A document with half (50%) of the actions of the top-most document, would be boosted half as much (50 vs 100). Any query for any of the query terms in a cluster results in seeing boosted documents first in the search results. So even the query
Data cache sliding expiration in minutes	30 minutes	Sliding expiration of data to be stored in cache.
Search engine URL property used for boosting	clickUri	The property that your backend uses for path.
User segmentation fields and weights	LTR Segmentation Field Name, User Profile Field Mapping, 1	This specifies the user segmentation boosting settings that are used to boost documents that are interacted with by users with similar classifications, such as role, location, job type, etc. The format for this setting is as follows: LTR Segmentation Field Name, User Profile Field Mapping (optional), Weight (optional - default is 1). You can specify multiple Learn-to-Rank segmentation fields by separating them with a semicolon(;). For example: Department,Department,10;JobTitle,SPS-JobTitle,5
Clear cache	N/A	Clear cache for the existing clusters
Scheduled Task Name	BAInsight Learn to Rank Scheduler	The name of the scheduled task used to run the LTR trainer
Scheduled Task Run Interval	7	The time interval that you want your data to be trained. For example, if you set it to 7, it retrains the data every 7 days
Enable	true	This should be set to true in case that you want to automatically train the data

Learn-To-Rank Trainer

The trainer takes the data from SmartAnalytics and creates the clusters (shown in the screenshot in section "Training Results" below).

The trainer is in the SmartHub package /Scheduled Jobs/LearnToRank/Task, in the file BAInsight.LearnToRank.Trainer.exe.config. See the code below.
If you change the paths for the logger configuration file folder - /Caching - you have to change it in the trainer as well.

  <appSettings>
    <add key="LoggingFile" value="Logs.xml" />
    <add key="LoggingOutputDir" value=".\Logs\" />
    <add key="log4net.Config.Watch" value="True" />
    <add key="ConfigFolder" value="../../../Configuration/" />
    <add key="OAuthFolder" value="../../../OAuth/" />
    <add key="CachingFolder" value="../../../Caching/" />
  </appSettings>
                                                

If the installation is successful, a new scheduled task is created in the Windows Task Scheduler.
The name of the task is the one specified in the Scheduled Task Name field from the Learn-To-Rank page.

Training Results

Clusters might contain documents that are deleted in the search index.

This is because the Analytics index still contains them.
In time they disappear from this list as usage for other documents increases.
If you want to accelerate the process you can manually delete them from the Analytics index and retrain the data.

Sample Training results are shown below.

Clusters - #1, #2, #3, #4
- Each cluster consists of:
  - A series of queries, shown on the left
  - Document URLs for each query listed on the right
  - Number of actions taken on document on far right (download, opening, previewing, etc.)
- Queries, based on their keywords similarity, are grouped into clusters
Bold text - policy matterid=333056, diabetes treatment, biomedical research, albert gore
- Original cluster query
- The documents this query led to (and a specific threshold number of actions were taken to those documents) are added to the cluster
  - The number of actions threshold is set in the table above in Number of actions threshold.
  - This is set to a very low value of 4 due to the small data set in this sample.
- For example, in Cluster #1, the query text policy matterid=333056 led to the documents shown in Cluster #1 below, "Drug Recall Policy.pdf," "Anti_fraud_and_Fraud_policy.pdf," etc.

Note: Production environments will most likely have a document threshold in the thousands.

Plain text queries
- Plain text, unbolded queries on left, under the bold text query, are queries that are pulled or "inherited" based on the document list on the right side
- These are the top queries which users have run to discover and take action on the documents listed on the right.
- In other words, a "backwards looking" query on the documents shown yields these queries, listed on the bold, original cluster query

Features

Learn-To-Rank results and user segmentation boosting

During a search, this stage checks if the query matches any of the clusters. If the query matches a cluster, it boosts the documents in that cluster according to their hits (number of clicks, previews, etc.). The documents that are selected to be boosted are the documents that the current query (or very similar query) lead to or documents that were interacted with by users with similar classifications, such as role, location, job type, etc.

The boost value is proportional with the number of actions of each document that the current query lead to and is within a the 1 - 100 range.

To use Learn-to-Rank results and user segmentation boosting:
1. Create a stage with empty parameters.
2. The stage must be first stage in the list of Tuning stages in the section "Query Tuning"

Learn-To-Rank Query Suggestions

This feature provides suggestions as query text is entered in the search field.

The Learn-To-Rank Query Suggestions provider is located under TypeAhead.

To enable and use this TypeAhead provider:

(must be an SmartHub administrators) Click the UI Editor link from the SmartHub ADMINISTRATION page.
Click the Select a page link from the top menu.
Select (double-click) the page (Index.html, landing.html, etc.) you wish to modify.
1. Below, the Results.html page is shown for sample purposes.
2. See How to Use the UI Editor.
Select the Customize type ahead link from the top of the page.
Type-ahead providers are listed under Settings on the left-side.
Select the LearnToRankSuggestions provider settings gear icon to produce the Type-ahead providers settings window.
Modify your settings as you desire. For details about each setting, see the table Type-Ahead Settings.
Click Apply.
Click Save changes.

Learn-To-Rank Content Suggestions

This feature provides the user with similar (search) results, excluding those present on the current page.

This is also used in the component Similar Documents.

For more about Content-By-Search, see How Users Can Personalize Their Search Results.

Learn To Rank Results Suggestions tuning stage
- In order to use LTR Results Suggestion, create a stage with empty Parameters.
- The stage must be first in the list of stages under Query Tuning section seen in the SmartHub Administration UI.
The Learn-To-Rank Similar Results module is located under <SmartHub installation>/modules/LearnToRank.
- In this module a Learn-To-Rank settings file contains the ID of your Content-By-Search (Learn-To-Rank element) and the URL property.
- This ID can be modified.

NOTE:

Be aware of the order of relevancy tuning stages!

Learn-To-Rank works only with SmartAnalytics v5.0 or above.